Atom AI Labs - AI-Powered Multi-Tenant Platform

Voice Wake Activation - Implementation Guide

Overview

Voice wake activation allows users to control ATOM agents using natural voice commands through the menu bar companion.

**Current Status**: Foundation Complete

✅ Backend voice command processor
✅ Frontend voice client
✅ Voice feedback UI
⏳ Native speech recognition (platform-specific)

---

Architecture

Components

**Voice Command Processor** (Python)

Natural language intent classification
Command routing and execution
Agent integration

**Voice Commands Client** (TypeScript)

Speech Recognition API wrapper
WebSocket communication
React hooks for UI integration

**Voice Feedback UI** (React)

Listening state indicator
Transcript display
Audio waveform visualization

**Native Integration** (Rust/Tauri)

Platform-specific speech recognition
Wake word detection
Audio capture

---

Current Implementation

Backend (Complete)

Voice Command Processor

from core.voice_command_processor import VoiceCommandProcessor

processor = VoiceCommandProcessor(db)
result = await processor.process_voice_command(
    tenant_id="tenant-123",
    agent_id="agent-456",
    command="Open dashboard",
    confidence=0.95
)

API Endpoint

POST /api/voice/command
Content-Type: application/json
Authorization: Bearer <token>

{
  "command": "Execute sales report",
  "confidence": 0.92,
  "agent_id": "agent-456"
}

Frontend (Complete)

Voice Command Client

import { useVoiceCommands } from '@/lib/voice-commands';

function VoiceControl() {
  const { isListening, transcript, startListening, sendCommand } =
    useVoiceCommands('tenant-123');

  return (
    <>
      <button onClick={startListening}>Start Listening</button>
      {isListening && <p>Listening...</p>}
      {transcript && <p>You said: {transcript}</p>}
    </>
  );
}

Voice Feedback Component

import { VoiceFeedback } from '@/components/menu-bar/VoiceFeedback';

<VoiceFeedback
  isListening={true}
  transcript="Open dashboard"
  response="Opening dashboard now"
/>

---

Platform-Specific Implementation

macOS (Recommended for First Release)

Native Speech Recognition

// src-tauri/src/voice_wake_macos.rs
use cocoa::base::id;
use cocoa::foundation::NSString;

pub struct MacOSVoiceWake {
    recognizer: id, // SFSpeechRecognizer
    is_listening: bool,
}

impl MacOSVoiceWake {
    pub fn new(wake_word: &str) -> Self {
        // Initialize SFSpeechRecognizer
        // Request microphone permissions
        // Configure for continuous listening
        Self {
            recognizer: create_speech_recognizer(),
            is_listening: false,
        }
    }

    pub fn start_listening(&mut self) {
        // Start real-time speech recognition
        // Detect wake word in audio stream
    }
}

Permissions

<!-- src-tauri/tauri.conf.json -->
<macOSPrivatePermissions>
  <key>microphone</key>
  <key>speech-recognition</key>
</macOSPrivatePermissions>

Windows (Future)

Windows Speech Platform

// src-tauri/src/voice_wake_windows.rs
use windows::Win32::Media::Speech::*;

pub struct WindowsVoiceWake {
    recognizer: ISpeechRecognizer,
}

impl WindowsVoiceWake {
    pub fn new(wake_word: &str) -> Self {
        // Initialize Windows Speech Platform
        Self {
            recognizer: create_speech_recognizer(),
        }
    }
}

Linux (Future)

Pocketsphinx Integration

// src-tauri/src/voice_wake_linux.rs
use pocketsphinx::*

pub struct LinuxVoiceWake {
    decoder: ps_decoder_t,
}

impl LinuxVoiceWake {
    pub fn new(wake_word: &str) -> Self {
        // Initialize Pocketsphinx
        // Load acoustic model
        Self {
            decoder: create_decoder(),
        }
    }
}

---

Supported Commands

Command Types

1. Agent Execution

"Hey ATOM, execute reconcile inventory"
"ATOM, run the sales report"
"Send a summary email to John"

"Open my dashboard"
"Go to dashboard"
"Show me the dashboard"

3. Status Queries

"Check agent health"
"What's the status?"
"How are the agents doing?"

4. Quick Actions

"Pause all agents"
"Enable autonomous mode"
"Show recent interventions"

---

Usage Flow

User Interaction

**Wake Word Detection**

**Command Listening**

**Intent Classification**

**Action Execution**

**Feedback**

---

Configuration

Wake Word Settings

const voiceConfig = {
  wakeWord: "Hey ATOM",
  sensitivity: 0.7,        // 0-1, higher = more sensitive
  language: "en-US",
  continuous: false,        // Stop after one command
  requireConfirmation: true  // Ask for confirmation on sensitive actions
};

Command Settings

const commandConfig = {
  confidenceThreshold: 0.7,  // Minimum confidence to process
  timeout: 5000,              // Command timeout (ms)
  maxRetries: 3,              // Maximum recognition retries
};

---

Permissions & Security

Required Permissions

macOS

microphone - Audio capture
speech-recognition - Speech processing

Windows

Microphone access
Speech recognition access

Linux

Audio input device
Speech recognition library

Security Measures

**Local Processing**

Wake word detected locally
No audio sent to cloud until wake word

**Authentication**

All commands authenticated via JWT
Tenant-isolated command execution

**Audit Logging**

All voice commands logged
Include transcript, timestamp, result

**Confirmation Required**

Sensitive commands require confirmation
User can review before execution

---

Error Handling

Common Errors

Error: Speech Recognition Not Supported

if (!client.isSupported()) {
  showNotification(
    "Voice commands not supported in this browser",
    "error"
  );
}

Error: Low Confidence Recognition

if (confidence < 0.7) {
  return {
    success: false,
    message: "I didn't catch that. Could you repeat?",
    requires_confirmation: true
  };
}

Error: Microphone Access Denied

navigator.mediaDevices.getUserMedia({ audio: true })
  .catch(err => {
    showNotification(
      "Microphone access denied. Please enable in browser settings.",
      "error"
    );
  });

---

Testing

Manual Testing

**Test Wake Word**

**Test Commands**

**Test Low Confidence**

**Test Background Noise**

Automated Tests

// Test voice command classification
describe('Voice Command Processor', () => {
  it('should classify dashboard command', async () => {
    const result = await processor.process_voice_command(
      tenant_id,
      agent_id,
      "Open dashboard",
      0.95
    );
    expect(result.action).toBe('open_url');
  });

  it('should reject low confidence commands', async () => {
    const result = await processor.process_voice_command(
      tenant_id,
      agent_id,
      "Mumble mumble",
      0.5
    );
    expect(result.success).toBe(false);
  });
});

---

Performance

Latency Targets

Operation	Target	Actual
Wake word detection	< 500ms	TBD
Speech to text	< 1s	TBD
Intent classification	< 500ms	TBD
Command execution	< 2s	TBD
End-to-end	< 3s	TBD

Optimization Strategies

**Local Wake Word Detection**

Reduce cloud dependency
Faster response time

**Intent Caching**

Cache common command intents
Faster classification

**Audio Processing**

Downsample audio for faster processing
Noise reduction for better accuracy

---

Future Enhancements

Phase 2 (Q2 2025)

[ ] Windows speech recognition
[ ] Linux speech recognition (Pocketsphinx)
[ ] Custom wake word training
[ ] Voice biometrics for security

Phase 3 (Q3 2025)

[ ] Multi-language support
[ ] Context-aware commands
[ ] Command history and replay
[ ] Voice shortcuts/macros

Phase 4 (Q4 2025)

[ ] Natural conversation mode
[ ] Follow-up questions
[ ] Multi-turn dialogues
[ ] Voice-activated workflows

---

Troubleshooting

Issue: Wake word not detected

**Solutions**:

Check microphone permissions
Adjust sensitivity setting
Speak closer to microphone
Reduce background noise

Issue: Commands misclassified

**Solutions**:

Speak clearly and at moderate pace
Use exact command phrases
Check confidence score
Review command logs

Issue: High error rate

**Solutions**:

Train custom wake word model
Adjust sensitivity threshold
Improve microphone quality
Reduce ambient noise

---

References

---

**Status**: Foundation Complete

**Platform Support**: macOS (ready), Windows (planned), Linux (planned)

**Production Ready**: With macOS native integration

Voice Wake Activation - Implementation Guide

Overview

Architecture

Components

Current Implementation

Backend (Complete)

Voice Command Processor

API Endpoint

Frontend (Complete)

Voice Command Client

Voice Feedback Component

Platform-Specific Implementation

macOS (Recommended for First Release)

Native Speech Recognition

Permissions

Windows (Future)

Windows Speech Platform

Linux (Future)

Pocketsphinx Integration

Supported Commands

Command Types

1. Agent Execution

2. Dashboard Navigation

3. Status Queries

4. Quick Actions

Usage Flow

User Interaction

Configuration

Wake Word Settings

Command Settings

Permissions & Security

Required Permissions

macOS

Windows

Linux

Security Measures

Error Handling

Common Errors

Error: Speech Recognition Not Supported

Error: Low Confidence Recognition

Error: Microphone Access Denied

Testing

Manual Testing

Automated Tests

Performance

Latency Targets

Optimization Strategies

Future Enhancements

Phase 2 (Q2 2025)

Phase 3 (Q3 2025)

Phase 4 (Q4 2025)

Troubleshooting

Issue: Wake word not detected

Issue: Commands misclassified

Issue: High error rate

References